Next generation phenotyping for diagnosis and phenotype–genotype correlations in Kabuki syndrome

The field of dysmorphology has been changed by the use Artificial Intelligence (AI) and the development of Next Generation Phenotyping (NGP). The aim of this study was to propose a new NGP model for predicting KS (Kabuki Syndrome) on 2D facial photographs and distinguish KS1 (KS type 1, KMT2D-related) from KS2 (KS type 2, KDM6A-related). We included retrospectively and prospectively, from 1998 to 2023, all frontal and lateral pictures of patients with a molecular confirmation of KS. After automatic preprocessing, we extracted geometric and textural features. After incorporation of age, gender, and ethnicity, we used XGboost (eXtreme Gradient Boosting), a supervised machine learning classifier. The model was tested on an independent validation set. Finally, we compared the performances of our model with DeepGestalt (Face2Gene). The study included 1448 frontal and lateral facial photographs from 6 centers, corresponding to 634 patients (527 controls, 107 KS); 82 (78%) of KS patients had a variation in the KMT2D gene (KS1) and 23 (22%) in the KDM6A gene (KS2). We were able to distinguish KS from controls in the independent validation group with an accuracy of 95.8% (78.9–99.9%, p < 0.001) and distinguish KS1 from KS2 with an empirical Area Under the Curve (AUC) of 0.805 (0.729–0.880, p < 0.001). We report an automatic detection model for KS with high performances (AUC 0.993 and accuracy 95.8%). We were able to distinguish patients with KS1 from KS2, with an AUC of 0.805. These results outperform the current commercial AI-based solutions and expert clinicians.


Materials and methods
The study was approved by the Comité Éthique et Scientifique pour les Recherches, les Études et les Évaluations dans le domaine de la Santé (CESREES), №4570023bis, the Commission Nationale Informatique et Libertés (CNIL), №MLD/MFI/AR221900, the Institutional Review Board, Faculty of Medicine, Chulalongkorn University (IRB 264/62), and in accordance with the 1964 Helsinki declaration and its later amendments.Informed and written consents were obtained from the legal representatives of each child or from the patients themselves if they were of age.

Photographic dataset
We included most pictures from the photographic database of the Maxillofacial surgery and Plastic surgery department of Hôpital Necker-Enfants Malades (Assistance Publique-Hôpitaux de Paris), Paris, France.This database contains 594,000 photographs from 22,000 patients, and all pictures since 1995 were taken by a professional medical photographer using a Nikon D7000 device in standardized positions.
We included retrospectively and prospectively, from 1995 to 2023, all frontal and lateral pictures of patients diagnosed with KS.The photographs were not calibrated.All patients had genetic confirmation of KS (KMT2D or KDM6A).We excluded all photographs taken after any surgerical procedure that could have modified the craniofacial morphology.Multiple photographs per patient corresponded to different ages of follow-up.Duplicates were excluded.
Controls were selected among patients admitted for lacerations, trauma, infection and various skin lesions, without any record of chronic conditions.More precisely, follow-up for any type of chronic disease was considered as an exclusion criterion.The reports were retrieved using the local data warehouse Dr Warehouse 24 .For each patient, the best lateral view was included.

Validation set
For designs №1 and №2, we randomly selected a group of individuals corresponding to 10% of the number of patients with KS, and the equivalent number of control patients.These patients were removed from the training set.The two sets were therefore independent.

Landmarking
We used three different templates based on 105 landmarks for the frontal views, 73 for the lateral views and 41 for the external ear pictures.We developed an automatic annotation model for each template following a pipeline including: (1) detection of the Region Of Interest (ROI) and (2) automatic placement of the landmarks.
For ROI detection, a Faster Region-based Convolutional Neural Network (RCNN) model was trained after data augmentation (images and their + 10° and 10° rotations), with a learning rate of 0.001, a batch size of 4, a gamma of 0.05 and 2000 iterations, optimized and split into two stages: ROI detection and determination of profile laterality.
(2) Determination of profile laterality-Pre-trained ResNet50 network 25 using the Pytorch library 26 .The training images included 1570 left profiles and 1579 right profiles.The batch size was 16, an Adam optimizer 27 was used with a learning rate of 0.001, a step of 7, and a gamma of 0.1, trained over 25 epochs.
For the automatic placement of landmarks, we used a patch-based Active Appearance Model (AAM) using the menpo library on Python 3.7 28 .We have previously reported the relevance of this approach 29 .We used two-scale landmarking: the model for frontal pictures was trained on 904 manually annotated photographs, with a first stage of dimensioning (diagonal = 150), a patch shape of [ (15, 15), (23, 23)] and 50 iterations and a second stage without resizing, with a patch shape of [ (20, 20), (30, 30)] and 10 new iterations.The model for profile pictures was trained on 1,439 manually annotated photographs, with a first stage of dimensioning (diagonal = 150), a patch shape of [ (15, 15), (23, 23)] and 25 iterations and a second stage without resizing, with a patch shape of [ (15,  15), (23, 23)] and 5 new iterations.The model for ears was trained on 1221 manually annotated photographs, with a first stage of dimensioning (diagonal = 100), a patch shape of [ (15, 15), (23, 23)] and 50 iterations and a second stage without resizing, with a patch shape of [ (20, 20), (30, 30)] and 20 new iterations.All three models used the Lucas Kanade optimizer 30 .
Each automatically annotated photograph was checked by two authors blinded for the diagnosis, QH and MD, and landmarks were manually re-positioned when necessary, using landmarker.io 31.The Intraclass Correlation Coefficient (ICC) was computed between the raters.ICC values greater than 0.9 corresponded to excellent reliability of the manual annotation 32 .

Geometric morphometrics
We performed Generalized Procrustes Analysis (GPA) 33 on all landmark clouds using the geomorph package on R 34 .Since the data were uncalibrated photographs, ROI sizes were not available: shape parameters only were assessed and not centroid sizes.Procrustes coordinates were processed using Principal Component Analysis (PCA) for dimension reduction.We retained the principal components explaining 99% of the total variance in cumulative sum.The last 1% was considered as negligible information.

Texture extraction
We partitioned the frontal and profile pictures into key areas and applied textural feature extraction methods to each zone, allowing to check the results and determine which zone had contributed most to the diagnosis.
We defined 14 key areas that could potentially contribute to diagnosis: 11 on frontal views (right/left eyes, right/left eyebrows, glabella, forehead, nasal tip, philtrum, right/left cheeks, and chin) and 3 on lateral views (pre-auricular region, eye, and zygoma relief).Each zone was extracted automatically using the previously placed landmarks.
We used the Contrast Limited Adaptative Histogram Equalization (CLAHE) algorithm for histogram equalization, as previously reported before the use of feature extractors 35,36 .CLAHE enhanced contrast by evenly dispersing gray values 37 and by reducing the influences of illumination during picture capture and of skin color.Kiflie et al. recommended CLAHE as a first choice equalization method 38 .
Gray-Level Co-occurrence Matrix (GLCM) methods, as proposed by Haralick 39 , are based on the estimation of the second-order joint conditional probability density functions, which characterize the spatial relationships between pixels.GLCM is commonly used in texture analysis 40,41 , for instance in radiomics on CT-scan or MRI images [42][43][44] or for skin texture assessment 45 .In GLCM, the co-occurrence matrix contains information on entropy, homogeneity, contrast, energy and correlation between pixels.GLCM includes 28 features, taking into account the average and range for each item of information and for each zone, representing 28 × 14 = 394 textural features for each patient.

Stratification using metadata
The textural features and the geometric principal components were combined for further analysis.To consider associated metadata (age and gender) and the fact that we included more than one photograph per patient (that is the non-independence of the data), a mixed model was designed for each feature.The variables to be explained were the features (geometric and textural), with age, gender and ethnicity considered as explanatory variables.A random effect on age and individuals was introduced.The equation of the mixed model was: where age.β1,i corresponded to a random slope for age per individual, and ε i,j was a random error term.We did not use an interaction term between age and gender and age and ethnicity as it did not increase the likelihood of the model.Age, gender and ethnicity are significant factors in dysmorphology 46,47 .
The residuals of each feature were computed to consider potential biases linked to the metadata:

Classification model
The inputs to the model were the residuals from the linear models described above, for each geometric or textural feature.We used eXtreme Gradient Boosting (XGBoost), a supervised machine learning classifier, for all the analyses 48 .We chose a tree-based booster, and the loss function to be minimized was a logistic regression www.nature.com/scientificreports/for binary classification.We set several hyperparameters to improve the performance and effect of the machine learning model: learning rate = 0.3, gamma = 0, maximum tree depth = 6.The model with the lowest error rate was chosen for analysis.We separated the dataset into a training set and a testing set, and a five-fold cross-validation was used to define the ideal number of iterations to avoid overfitting.The chosen model with the ideal number of iterations was then used on the independent validation set to test performances, by plotting accuracy and AUC.The Receiver Operating Characteristics (ROC) curves were plotted in R using the plotROC package 49 .We used the DeepGestalt tool proposed by Face2Gene CLINIC on our validation set, to be able to compare its performance (accuracies).

Uniform Manifold Approximation and Projection (UMAP) representations
The residuals ε i,j were represented using UMAP for visual clustering, a nonlinear dimension reduction technique 50 .We retained the residuals associated with features with a classification gain (in their cumulative sum) > 0.75 in the importance matrix associated with the XGboost model.A k (local neighborhood size) value of 15 was used.A cosine metric was introduced to compute distances in high dimensional spaces: the effective minimal distance between embedded points was 10 −6 .The three conditions of UMAP, namely uniform distri- bution, local constancy of the Riemannian metric and local connectivity were verified.UMAP analyses were performed using the package umap on R 51 (Fig. 1).

Population description
Ranging between 1998 and 2023, we included 1448 frontal and lateral facial photographs, corresponding to 634 patients.The mean age was 7.2 ± 4.2 years and ranged from 0 to 40.2 years; 52% were girls.Ethnicity was 92% Caucasian, 6% African or Caribbean, and 3% Asian.
The control group comprised 1084 photographs, corresponding to 527 patients with a mean age of 7.0 ± 4.6 years.Fifty-four percent were girls and ethnicities were 93% Caucasian, 5% African/Caribbean, and 2% Asian.
The KS group comprised 364 photographs, corresponding to 107 patients with a mean age of 7.8 ± 6.7 years.Forty-two percent were girls and ethnicities were 85% Caucasian, 7% African/Caribbean, and 8% Asian.Seventyeight percent of patients were KS1 (Table 1).
Two patients had a genetically confirmed diagnosis of KS, but we had no information on the causal gene.We thus collected information on genetic variation for 105 KS individuals with 82 (78%) and 23 (22%) with variations in KMT2D (KS1) and KDM6A (KS2) respectively.

Phenotype
We confirmed the usual characteristics described in KS: high and arched eyebrows, long palpebral fissures, and large and prominent ears (Fig. 2).www.nature.com/scientificreports/

Classification
We were able to distinguish KS vs controls in the independent validation group with an accuracy of 95.8% (78.9-99.9%,p < 0.001).AUCs were comparable in the training set (0.994) and in the validation set (0.993) (Fig. 3, Table 3).
Ten out of eleven patients were correctly predicted as KS with our model, and this performance was the same using Face2Gene CLINIC (Supp.Table 1).In addition, we were able to predict all control patients (Fig. 4, Table 4).

Classification
The model was able to distinguish KS1 from KS2 with an empirical AUC of 0.805 (0.729-0.880, p < 0.001) (Figs. 6, 7).This trend was found in the validation group, with an accuracy of 70% without reaching the significance threshold (Tables 5 and 6).

Design №3: PTV vs PAV in KS1
The model was unable to detect a difference in facial phenotype between KS1 patients with a PTV compared to KS1 patients with a PAV (0.555 [0.419-0.690],p = 0.786) (Fig. 8).

Discussion
The model we report distinguished KS from controls in the independent validation group with an accuracy of 95.8% (78.9-99.9%,p < 0.001).Only 1 patient out of 24 was classified as 'control' while she had KS (accuracy 96%).In the KS group, 10 out of 11 patients were correctly classified (accuracy 91%).Using the Face2Gene CLINIC tool on KS patients (because DeepGestalt technology is not capable of recognizing non-syndromic patients) 1 patient out of 11 could not be analyzed and could not be classified as KS (accuracy 91%).Performances were therefore comparable.Interestingly, the patient not recognized by our model and by Face2Gene CLINIC was of African ethnicity, highlighting the lack of training data for non-Caucasian patients.The distribution of ethnic groups       www.nature.com/scientificreports/varies greatly from one center to another, which is why we believe it is important to encourage international collaborations in the field of Next Generation Phenotyping.The model we report was also capable to distinguish KS1 from KS2 with an empirical AUC of 0.805 (0.729-0.880, p < 0.001).Rouxel et al. 5 showed that the Face2Gene RESEARCH tool distinguished KS1 from KS2 in a cohort of 66 patients with an AUC of 0.722 (p = 0.022).The same team showed a classification accuracy of 61% (20/33) by clinical genetics experts between KS1 and KS2.The performance of our model was at least comparable to Face2Gene RESEARCH and seemed to outperform that of clinical experts.
Rouxel et al. 5 explained that KS1 patients had a longer face and nose, a thin upper lip vermilion and a longer midface in comparison to KS2 patients, who have a rounder face, a thicker vermilion and anteverted nostrils.Our study reports new phenotypic features not seen on frontal images alone for KS2, such as a particular morphology of the external ear, longer along the vertical axis and with counter-clockwise rotation.www.nature.com/scientificreports/Phenotype-genotype correlations have been reported in KS for extra-facial anomalies.Cardiovascular abnormalities, namely ventricular septal defects, coarctation of the aorta, atrial septal defects, bicuspid aortic valve, patent ductus arteriosus, and hypoplastic left heart syndrome 52,53,[53][54][55] are more prevalent in KS2 compared to KS1 1,56 .Persistent hypoglycemia due to pituitary hormone deficiency, adrenal insufficiency, growth hormone deficiency and dysregulated insulin secretion by the pancreatic β-cells 57,58 are also more frequent in KS2 10 , possibly because the inhibition of KDM6A increases the release of insulin from pancreatic islet cells, as suggested by mouse models 1,59 .Urinary tract anomalies, such as horseshoe kidneys and renal hypoplasia, seem to be more frequent in KS1, and genital defects such as cryptorchidism and hypospadias could be more frequent in KS2 56,60,61 .
Rouxel et al. 5 underline the lack of Asian patients in their evaluation, and proposed that larger series were needed to better define phenotypical differences between KS1 and KS2, and the general dependance of the phenotype with ethnicity 6,12 .The collaboration with an Asian clinical genetics center (Bangkok) is thus a strong point of this study.
The use of textural feature extraction allowed our model account for typical KS characteristics not recognized by geometric analysis (Procrustes) alone.The lateral sparsening of the eyebrows and heavy lashes giving the impression of make-up eyes were thus included into in the classification.
Barry et al. 1 reported a large meta-analysis including 152 articles and 1369 individuals with KS and assessed the prevalence of the different types of pathogenic variation per gene.The majority of KMT2D variants were truncating (non-sense 34%, frameshift 34%), then missense (23%) and finally splice site variants (9%).The majority of KDM6A variants were truncating (frameshift 36% > non-sense 27%), followed by splice site (20%), and missense (18%).We found similar results, with a higher prevalence of truncating non-sense variants for both genes.There was a higher prevalence of splice donor site variants, with 26% for KMT2D and 30% for KDM6A.Some authors report a more severe clinical outcomes in patients with non-sense variants than in patients with a frameshift variant 1 .Faundes et al. 56 found more severe neurodevelopmental anomalies in patients with proteintruncating mutations in the KS2 group.Shah et al. 62 reported ophthalmological anomalies such as strabismus, blue sclerae, microphthalmia and refractive anomalies that were more severe in patients with a non-sense variant, and less frequent in patients with a frameshift variant.Our model did not find any significant difference in facial phenotype between PTV and PAV.

Conclusion
Here we report an automatic detection model for KS including the face, profiles and ears, with performances (AUC 0.993 and accuracy 95.8%) comparable to those of Face2Gene, on an independent validation set.These performances were achieved using an international cohort of 107 patients with a confirmed molecular diagnosis of KS.Using the same model, we were able to separate patients with KS1 (KMT2D) from KS2 (KDM6A), with an AUC of 0.805.These results seem to at least outperform Face2Gene and support the possibility of using a phenotype-first strategy to diagnose KS and detect its two causal genes.

Figure 2 .
Figure 2. Average shapes in KS and controls and comparisons after Procrustes superimposition of frontal views, profile views, and external ears for three age groups.Blue = controls, Dark red = KS.

Figure 4 .
Figure 4. Classification using design №1 for proband 3 of the validation set.(A) and (B) Frontal and profile views of proband 3. (C) UMAP representation of the training data according to the two groups, with positioning of proband 3. (D) Histogram of predictions by the model.This child was also detected as KS by Face2Gene CLINIC.KS, Kabuki Syndrome.

Figure 5 .
Figure 5. Average shapes in KS1 and KS2 and comparisons after Procrustes superimposition of frontal views, lateral views, and external ears for three age groups.Orange = KS1, Dark red = KS2.

Figure 7 .
Figure 7. Classification using design №2 for two probands of the training set.(A, B, E and F) Frontal and profile views of the two probands.(C and G) UMAP representations of the training data according to the two groups, with positioning of probands 3. (D and H) Histograms of predictions by the model.The phenotype included a reduced height of the midface, a thicker upper lip, and a vertical elongation of the external ear in the KS2 group (E and F).KS, Kabuki Syndrome.

Table 5 .
Classification performances for design №2 (KS1 versus KS2) in the validation group.Significant values are in [italics].AUC, area under the curve.